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Abstract 

Adjoint methods form a class of importance sampling methods that are used to 
accelerate Monte Carlo (MC) simulations of transport equations. Ideally, adjoint 
methods allow for zero-variance MC estimators provided that the solution to an 
adjoint transport equation is known. Hybrid methods aim at (i) approximately 
solving the adjoint transport equation with a deterministic method; and (ii) use 
the solution to construct an unbiased MC sampling algorithm with low variance. 
The problem with this approach is that both steps can be prohibitively expensive. 
In this paper, we simplify steps (i) and (ii) by calculating only parts of the adjoint 
solution. More specifically, in a geometry with limited volume scattering and 
complicated reflection at the boundary, we consider the situation where the adjoint 
solution "neglects" volume scattering, whereby significantly reducing the degrees 
of freedom in steps (i) and (ii) . A main application for such a geometry is in remote 
sensing of the environment using physics-based signal models. Volume scattering 
is then incorporated using an analog sampling algorithm (or more precisely a 
simple modification of analog sampling called a heuristic sampling algorithm) in 
order to obtain unbiased estimators. In geometries with weak volume scattering 
(with a domain of interest of size comparable to the transport mean free path), 
we demonstrate numerically significant variance reductions and speed-ups (figures 
of merit). 

Keywords:Linear Transport; Monte Carlo; Hybrid Methods; Importance Sam- 
pling; Variance Reduction; Remote Sensing 



1 Introduction 

Forward and inverse linear transport models find applications in many areas of 
science including medical imaging and optical tomography [1] , radiative transfer in 
the atmosphere and the ocean [4, 12, 14], neutron transport [6, 16], as well as the 
propagation of seismic waves in the earth crust [15]. In this paper, we focus on the 
solution of the forward transport problem by the Monte Carlo method with remote 
sensing (an inverse transport problem) of the atmosphere as our main application. 
Light is emitted from the sun and propagates in a complex environment involving 
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absorption and scattering in the atmosphere and at the Earth's surface before (a 
tiny fraction of) it reaches a detector, typically mounted on a plane or a satellite. 

The transport equation may be solved numerically in a variety of ways. Monte 
Carlo (MC) simulations model the propagation of individual photons along their 
path and are well adapted to the complicated geometries encountered in remote 
sensing. Photons scatter and are absorbed with prescribed probability depending 
on the underlying medium. The output from the simulation, e.g., the fraction 
of photons that hit a detector, is the expected value of a well-chosen random 
variable. These simulations are very easy to code, embarrassingly parallel to run, 
and suffer no discretization error (in principle). The drawback is that they can 
be very slow. Monte Carlo methods converge at a rate = {VarianceN~ 1 / 2 ) where 
./V is the number of simulations, and the variance is that of each shot fired. In 
remote sensing, the (relative) variance is high in large part because the detector is 
typically small and thus most photons are not recorded by the detector. In order 
to be effective, MC methods must be accelerated. 

Most efforts to speed MC simulations focus on reducing the variance of each 
shot. See [16, 13] or the review of more recent work (on neutron transport) in 
[9]. See also [20] for a thorough introduction to the MC techniques in computer 
graphics. In problems with a small detector, this is achieved by directing pho- 
tons toward that detector (and re- weighting to keep calculations unbiased). When 
survival biasing is used, photons have their weight decreased rather than being 
absorbed [16, 13]. Often, one uses some heuristic (such as proximity to the de- 
tector), or some function to measure the "importance" of each region of phase 
space. Splitting methods [16, 13], upon identifying that a photon is in a region of 
high importance, split the photon into two or more photons. The weight of each 
photon is then decreased. Propagating many photons with a low weight is not 
desirable, therefore splitting is often accompanied by Russian roulette. Here, if 
a photon enters a region of low enough importance, then the photon is killed off 
with a certain probability (high chance of death if the weight is low) . Typically a 
weight window is used to enforce regions of low/high importance. Source biasing 
techniques change the source distribution in order to more effectively reach the 
detector. More generally, the absorption/scattering properties at any point can 
be modified, provided shots are re-weighted correctly. 

It has long been recognized that the adjoint transport solution is a natural 
and optimal importance function [16, 13, 17, 18, 9, 19, 7]. One can use well- 
chosen approximations of the adjoint solution (typically a rough deterministic 
solution) to reduce variance. The result is a hybrid method (deterministic+MC). 
The AVATAR method uses an adjoint approximation to determine weight windows 
[19]. The CADIS scheme in [9] uses an adjoint approximation in both source bi- 
asing and weight-window determination. An adaptive technique that successively 
refines the solution in "important" regions (and uses to adjoint to designate such 
regions) is described in [10, 11]. In [16, 17], a zero- variance technique is outlined 
that uses the true adjoint solution to fire photons that all reach the detector with 
the same weight (which happens to be the correct answer). This method is im- 
practical since determining the exact adjoint solution everywhere is harder than 
determining some integral of that solution. The LIFT method [17, 18] uses an 
approximation of the adjoint solution to approximate this zero-variance method. 

We adapt the zero-variance technique to the particular problem we have at 
hand; see figure 1 for the type of geometry considered in this paper. The problem 
we consider has a fixed, reflective, complex boundary, and relatively large mean- 



free-path (MFP) in the sense that a large fraction of the photons reaching the 
detector have not scattered inside the atmosphere. The calculation of the approx- 
imate adjoint solution used to approximate zero- variance techniques is difficult 
and potentially very costly. What we demonstrate in this paper is that partial, 
"localized" (in an appropriate sense) knowledge of the adjoint solution still offers 
very significant variance reductions. More specifically, we calculate adjoint solu- 
tions that accurately account for the presence of the boundary but do not account 
for volume scattering (infinite MFP). The calculation of the adjoint solution thus 
becomes a radiosity problem with much reduced dimensions compared to the full 
transport problem. This, of course, can only reduce variance in proportion to the 
number of "ballistic" photons that never interact with the volume. Moreover, an 
adjoint solution that does not "see" volume scattering cannot be used alone as a 
variance reduction scheme for otherwise volume scattering would be neglected and 
the simulation biased, which is not allowed. When combined with simple rules 
for allowing volume scattering and sending some photons directly from the vol- 
ume to the detector, our hybrid method yields very significant variance reduction 
at relatively minimal cost. Furthermore, the methodology studied is applicable 
whenever any method is available to deterministically pre-calculate flux over any 
subset of paths. For instance, complex propagation of light in clouds and its 
importance could be pre-calculated locally and incorporated into the MC simula- 
tions in a similar fashion. This modular approach to the description of the adjoint 
solution is well-adapted to the remote sensing geometry and avoids complicated, 
global, and hence expensive deterministic calculations of adjoint transport solu- 
tions. Our treatment of the reflecting boundary described in detail in this paper 
is a first step toward modular adjoint transport calculations and their variance 
reduction capabilities in remote sensing. 



Figure 1: Mountain (1 — cos 3 x shape), cloud, sky, and detector. Dot size indicates 
relative adjoint flux strength. Large dots on right-hand-side are the detector (dot size is 
down-scaled for detector). Dot size on mountain indicates that portions of the mountain 
are shaded from the detector, and that the scattering albedo is non-constant. 

The rest of the paper is structured as follows. Section 2 presents basic infor- 
mation about the transport equation with reflecting boundary. Section 3 presents 
our main theoretical results on hybrid acceleration of Monte Carlo by determin- 
istic adjoint calculations. We adopt an importance sampling viewpoint [3] that 





is common in the statistical literature. This means we view the modifications 
to absorption/scattering as a change of probability measure and the re- weighting 
as a Radon-Nikodym derivative (Jacobian). This allows us to fit many methods 
together under one framework. In particular, source-biasing, the zero-variance 
scheme, our approximation of it, and our "heuristic" volume-to-detector adjust- 
ment are put in this light. This allows us to obtain estimates of variance as a 
function of scattering/absorption coefficients and the accuracy of the determinis- 
tic solver. 

Sections 3.1 and 3.2 recall the main ideas behind importance sampling and 
the use of adjoint transport solutions. We recall how zero-variance chains can be 
constructed and show how they can be approximated by small-variance chains. 
In the absence of volume scattering, a small variance chain is constructed in 
section 3.3. The modularity mentioned earlier in this section is implemented by 
a regularization methodology introduced in (41) in section 3.4.1. The Surface 
Adjoint Importance (SAI) method, used to incorporate the adjoint solutions that 
accurately describe the surface defined in section 3.3 in a scheme that also handles 
volume scattering, is described in detail in section 3.4. The variance reduction 
and speedup that can be gained from the proposed methodology are presented in 
section 4. Several details in the derivation and the proof of the results of section 
3 and the numerical implementation of the simulations of section 4 are postponed 
to Appendix A. 

2 Transport with Reflecting Boundaries 

Let X C M. d (d = 3 in practice and d = 2 in our numerical simulations) be 
an open (spatial) domain with smooth boundary dX. Denote X U dX by X. 
For x G X photons will have velocities v G S^ 1 , the unit sphere, and we call 
the pair z = (x, v ) G Z. When x G dX we separate directions into incoming 
and outgoing. With v x the outward unit normal vector at x G dX we have 
r± := {(x,v) : x G dX,±v ■ v x > 0}. Note that z always is interpreted as the 
pair (x,v), and for example Zj = (xj,Vj). 

Photons will be cast along rays, and travel until they hit the boundary. We 
define the forward and backward propagation times as t±{z) := mint>o{x ± tv G 
dX}. We also define the forward and backward spatial and phase-space propaga- 
tions x±(z) := x ±t±(z)v, z±(z) := (x±(z),v). The rays themselves are denoted 
by a starting point and direction, r(z) := {x + tv : < t < t + (z)}. 

Define an integral over 2 := 2UT_ U T + by 

[ f(z)dz:= [ f(z)dz+ [ [ f(z)dfi(x)dv, 
Jz JZ J§ d -! JdX 

where dfx the surface measure on dX and an inner product by 

(/, 9) = J f(z)g(z)dz. 

Some functions are defined only, for example, on T_. In that case we extend the 
function to Z by setting it equal to zero off of T_. 

Light traveling through X encounters an absorption cross section a a (x), scat- 
tering kernel 6(x, v—>v'), and scattering cross section a s {x) := Jgd-i 0(x, v^v') dv', 
which is assumed independent of v. The total cross section a := o a + a s . The 



exponential of a is denoted by 

E a ( Xl ,x 2 ) : = e - /J" 1 "" 2 ' 

where v\ = x 2 — x\ := (x 2 — x\)\x 2 — ^ll^ 1 - We define E aa , E Ug similarly. Once a 
photon collides with the boundary, it is scattered with probability a(x). In that 
case, the probability distribution Q(x,v-tv') determines the new direction. This 
implies 



f Q(x,v-*/)di/ = 1. 



>v x -v'>0 

We model photon flux density u in our medium with source s by 

v • V 'xu(z) + a(x)u(z) = Ku{z) 

_ KMrM + .(,) (1) 



where 



z € Z 



Kf(z) = [ 9(x,v'^v)f\ z (x,v')dv', 
Kf{z) = a{x) @{x,v'^v)\v x -v'\f\ T+ {x,v')dv' z G T . 

Ju x -v'>0 

Since the transport problem is linear, we normalize s so that 



(2) 



r_ 



s(x, v)dfi(x)dv = 1. (3) 



Multiplying the identity v ■ V x u(x — tv, v) + a(x — tv)u(x — tv, v) = Ku(x — tv, v) 
by the integrating factor E a (x,x — tv) and integrating t from to T-(z) we find 
that u satisfies the following integral transport equation: 

CO oo 

u = LKu + Ls, so that u = ^{LK) n Ls = L^{KL) n s, (4) 

n=0 n=0 

where (with z G Z U T + ) 
Then (4) motivates us to define i\j solving 



f T -^ E (x x (z)) 

Lf(z):= / E a (x, x - tv)f\ z (x - tv, v) dt + ^ X ~ {Z > f\ r _ {z- (z)). 

J0 \Vx-(z)-V\ 



ipo = KLip + s, so that ip = '^2(KL) n s and u = Lip Q . (5) 



n=0 



The decompositions (4) and (5) of the transport solution into components cor- 
responding to increasing orders of scattering is standard in forward and inverse 
transport theory. We refer the reader to e.g. [2, 5, 16] for additional details. 



2.1 Coefficient assumptions and measurement setup 



The function g(z ) := g(x, v) describes the phase-space representation of the detec- 
tor. We will see that the Monte Carlo detector is defined as g(z) := g(z)\v x -v\~ l . 
We assume that the source/detector are nonzero only on the incoming/outgoing 
boundaries: supp(s) C r_, supp(g) C T+. Finally, we assume that the detec- 
tor is non-scattering, a(x) = for (x,v) G supp(g). We also have a = on 
the sky and left/right sides to model photons that escape our domain. These 
assumptions are satisfied for source radiation coming from the sun and detectors 
on high-elevation planes or satellites. The methodology we present could easily 
be adapted to detectors placed in the volume. 

Our measurement is the phase space integral (g, u). All numerical methods 
employed will approximate this integral. When g(z) = v x ■ v (for x on the support 
of the detector), the detector is measuring photon flux. This corresponds to 
counting Monte Carlo photons that pass through the support of g. 

2.2 Adjoint solutions and operator decomposition 

We will see that it is the adjoint operator (and its kernel) that is needed to 
define the Markov chain transition kernels in MC simulations. We denote adjoint 
operators by *, and adjoint is defined with respect to the inner product (•, •). The 
methods used in this paper rely on a decomposition of the operator {LK)* into 
C* (ray Casting) and S* (Scattering) operators. We have 

/~r+(zi) 

C*f{z 1 ) : = / E a (x l ,x l +tvi)f\ z (x l +tv 1 ,v 1 )dt + E^x^x^z^f^^z+^x)), 
Jo 

when z\ G Z U T_, and 

6(xi,Vl^V2)f\z(xi,V2)dv2, zi £ Z 

d-i 

®(x 1 ,v 1 -^v 2 )f\r_(xi,v 2 )dv2, zi G T + . 

:i -V2<0 

While C* / L*, we still have C*S* = (KL)*, which implies of course that 
SC = KL. We also note that C*g = L*g. The notation xi,vi,zi, and X2,t>2,^2 
is suggestive of the fact that these variables will later represent photon posi- 
tions/velocities at the first, second, third, etc. . .position. 
Define the adjoint ip* by 

oo oo 

^* = C*S*ip* a + C*g, so that Vo = ^{C*S*) n C*g = C* ^2(S*C*) n g. (6) 

n=0 n=0 

Then definitions of ip a , ijj* imply (s, if)*} = (C*g, ijj ), and therefore 

(s, r ) = (C*g, fa) = (L*g, Vo) = (u, g). (7) 

In other words, the adjoint solution ipo{z) is a weight giving the "importance" of a 
source at point z on our measurement (u, g). This is the first fundamental reason 
for the use of the adjoint solution in Monte Carlo transport; see e.g. [16, 17] and 
also Theorem 3.1 below. Note that if)* can be shown to solve 

Clr + =S*(ClrJ + ff. 



S*f(zi) : 



Is 



The relation in (6) motivates us to define tp* solving 

oo 

tf=S*C*tf+g, so that ^ = J2(S*C*) n 9- 

?1=0 

We also have the relations 

V>: = c*y*, i/>i=S*ti + g. (9) 

Both ip* and tp* appear naturally in Monte-Carlo transport. When construct- 
ing transition kernels (that determine casting/direction changes), one will be a 
normalization constant for the other (implicitly or explicitly). We make the dis- 
tinction explicit due to the following heuristics. We may think of ip*(zi) as the 
incoming importance at z\. To it we associate an arrow directed into point x\ with 
direction v\. ip*(xi,vi) is the outgoing importance at (x±,vi) since it is the inte- 
gral of incoming importance at all possible points (x2, v\) along the ray x\ + tv\. 
Likewise, away from the support of g, ip* = S*ip*, meaning that the incoming 
importance at x in direction v is the integral of all importance exiting x. 

It is important to note that all chains described here alternate casts with 
direction changes. Casts move particles from a point (x±,vi) to a point (x\ + 
tv\ , v±) on a one-dimensional line segment while direction changes move particles 
from a point (x2,vi) to a point (^2,^2) on a (d — 1)— dimensional sphere. One 
could alternatively try devising a scheme that moves Zj — > Zj+i directly. This 
significantly increases computational cost since, given z±, Z2 may lie anywhere on 
the d dimensional manifold {x\ + ii^i : < t < t + (z±)} x S d_1 . Thus, sampling 
Z2 directly would require handling a d dimensional data structure rather than a 
1 dimensional and a (d — 1) dimensional data structures for alternate casts and 
direction changes. This is our main motivation for introducing the operators C* 
and S* rather than directly working with (LK)* . 



2.3 Transport when a = 

When a = (the "surface" regime) , we have C* = C s (with kernel he* ), S* = S s 
(with kernel ks") where 

C s f( Zl ):= / /|z(*i + tt;i,t;i)dt + /|r + (2+(*i)), 
Jo 

( 0, zx G Z 

SS f^) : = \ a (xi) I 6(xi,t;i->U2)/|r_(xi,«2)d«2, z x G T+ 

\ Jv xi -V2<0 

We then define tpf as the solution to 
and let Vo == C s V>f • Since = 0, 

and for z G T + , 

S s C s il) s i (z 1 ) = a(x 1 ) j &(x 1 ,v 1 ^v 2 )'<P!(z + {x 1 ,V2))dv2. 

In other words, we can solve for ipf by paying attention only to the boundary, and 
then propagate it to compute ip*. 



3 Monte Carlo with Reflecting Boundaries 



Monte Carlo consists of simulating transport one photon at a time. Photons 
propagate along straight lines until they interact with the underlying medium, 
where they are either absorbed or scattered into another direction, or reach the 
detector where they are collected. It can be shown that photon paths terminate 
(with probability one) after finitely many collisions. Following [16], paths will be 
written uj = (zo, . . . , z T -±, (x T ,D)). So the initial point zq = (xo,vq) G Z, and 
subsequently we choose x\ by casting a ray, then v\ by changing direction, then 
X2, then v 2 and so on until absorption. At this stopping time r, x T is chosen, and 
then v T is set equal to d, the "dead velocity." The chain is now terminated. Let 

n : = {{(x ,v ), (x T _i,w T _i), {x T ,d)) : vj = xf^T^Xj}. 

All casts and direction changes (including "death" ) are made by drawing ran- 
dom variables. We thus introduce a probability measure on the set of paths U. 
We note that t(uj) = n} = {t = n} is the set of paths terminating after 

n — 1 scattering events. 

A probability measure on Q is a map P from the (measurable) subsets of Q into 
[0, 1]. It corresponds to a method of choosing paths. Given a set i C O of possible 
paths, PL4] is the probability that a path lies in A. P[r = n] := P[{w : t(oj) = n}} 
is the probability that the chain terminates at step n. Let D denote the paths 
that end up hitting the detector. Then P[D] is the probability of hitting the 
detector. With the indicator function 1d(w) defined as 1d(w) = 1 if oj G D and 
zero otherwise, we have the notation 

P[D}= [ dP(w)= [ l D {u)dP(u)=E{l D }. 
J d Jn 

Here, E{} denotes mathematical expectation (ensemble averaging) w.r.t. P. 

3.1 Monte Carlo and Importance Sampling 

The analog measure P a closely follows the physics of photon propagation (at least 
one reasonable physical model for photon propagation). Monte Carlo simulations 
based on this measure have very large (relative) variance because most of the 
photons do not reach the detector. Several standard methods exist to modify 
the measure to steer more photons toward the detector in an unbiased way, i.e., 
in a way that does not modify the detector reading (u, g). We start with a 
presentation of the analog chain and then present the main ideas of importance 
sampling to reduce variance in MC simulations. We also present the (standard) 
survival biasing chain, which forms a basis for comparison and a component in 
our composite SAI chain. 

3.1.1 Analog Sampling 

We first define the analog transition kernels fcg.* and feg* , associated to the oper- 
ators C* and S*. The analog ray casting transition kernel is 

k ( c*{z 1 ->■ x 2 ) : = [8 r ( zi )(x 2 )cr(x2) + S(x 2 - x + (z 1 ))] E er (xi,x 2 ). 

Above, 5 r ( zi }(x 2 ) is the "delta function" in M. d concentrated along the ray r(z\). 
It forces x 2 to be along the path x\ + tv±, t > 0. 5(x 2 — x + {z\)) forces x 2 to be 



on the boundary at x+(zi). Since 



-^-(1 - E a (x,x + tv)) = a(x + tv)E a (x,x + tv), (10) 

we have J ^ k^* (z\ — > X2) dx2 = 1. This means that the probability of termination 
during an analog casting event, 

Vc* {zi) : = 1 - / fcg,. {zi -»• x 2 ) dx 2 = 0. 
Jx 

Next, the direction change kernel is given by 

• a a \ \ \ f Q(x2,V 1 -^V 2 )(t(x2)~ 1 , X 2 G X 

b x ' ' \ a(x 2 )Q(x 2 ,v 1 ^v 2 ), x 2 edX. 
We find that the probability of termination during a direction change is given by 

Fb 1 ^ \ 1 - a(x 2 ), x 2 £ dX. 

These kernels lead to the standard algorithm 1 [16]: we sample zo from the 
normalized source s (written zq ~ s), then cast according to k^ t . Then particle is 
absorbed with probability p a s ». If the photon is not absorbed, we change direction 
using a pdf proportional to kg* (kg* doesn't integrate to one, so it is not a pdf). 
Then we cast again and so on until we are absorbed. This defines the chain oj = 
((xq,vq), . . . , (x T _i, v T -i), (xr,^)). At this point, we define the random variable 
modeling detector reading: 

Note that our assumptions on a imply p a s * = 1 on the support of g. The simplest 



Algorithm 1 Analog 

1: Draw z ~ s, set j ' <— 

2: while Vj y^d do 

3: Draw Xj +1 ~ k^*(zj — > •) 

4: With probability p|*(xj + i, Vj), Vj + i = d 

5: if Vj + i 7^ d then 

6: Draw Vj + i from a distribution oc kg*((xj+i, Vj) — > ■) 
7: end if 
8: j <- j + 1 
9: end while 

10: Record £ a (w) = g( x j, v j-i) 



example is when the detector measures flux through the surface. In this case 
g(z) = Ion supp(g) and we simply count MC photons hitting the detector. Chains 
generated using algorithm 1 induce the analog probability measure 

dP a (w) = s(z )k%*{z -»• xi)/c^((xi,t;o) -> t>i) 

X ■ ■ • X (Z T _ 2 -»• X T _i)fc§. ((X T _1, U T _ 2 ) -> U T -l) (12) 

x (z T _i — >■ x T )p a s * (x T ,v T -i) dzQ • • • dz T _i dx T . 



It is instructive to write this out in the case where photons only interact with the 
volume, and then reach the detector. In this case (keeping in mind that p a s * = 1 
on the detector, and ignoring the dz • • • dx r ), dP a becomes 

s(z )5 r ^ zo - ) (xi)E (T (xo,xi)9(xi,vo^vi) ■ ■ ■ S r ^ ZT _ 1 )(x T )E tT (x T -i, x T ). (13) 

We recall the normalization (3) from which we deduce that J n dP a = 1. Note first 
that the above chain is terminated with a cast and use of kg* . Second, note that 
the measure above is a multiplication of singular measures and must be carefully 
defined. E.g. recall that we must fix z\ in order for k^ t {z\ — > X2) to be well 
defined; see the proof of theorem 3.1 (in the appendix) for details. 
The next theorem shows that the chain £ (u;) is indeed unbiased. 

Theorem 3.1. With E a {•} denoting expectation under the measure P a , we have 

E«» {&} = <«> g). 

This result is standard in the absence of a boundary; see e.g. [16]. Its proof is 
sketched in the appendix. Algorithm 1 is a method for producing one draw (shot) 
£ a (u;) from P a . As is "always" done with Monte Carlo techniques, we produce N 
draws {£, a {oj l )}f =1 in an identical fashion, then estimate 

1 N 

i=l 



3.1.2 Importance sampling 

Here we give a quick introduction to importance sampling and show how it relates 
to our scheme. Given the analog probability measure dP a , we can use a different 
measure dP for sampling. With £ a defined as in (11), and £ := £ a 

dP a 



dP a 



(u, <?>=E a {U 



Jn 



dP 3 



dp 



dp 

dP = E{|}., 



where the Radon-Nikodym derivative 



dP a 



dP 



(the Jacobian) must be defined on 



supp(£ a ). This happens precisely when, for any measurable A C f2 such that 
P a (A) > 0, we also have P(A) > 0. In this case we say that P a (or dP a ) is 
absolutely continuous with respect to P (or dP). Then we can estimate the mea- 
surement in one of two ways: 

1. (u, g) « jf J2iLi Ca(^i)) where are sampled according to P a 

2. (u, g) pa jj J2iLi where uii are sampled according to P. 

For uncorrelated samples, the variance in either case (£ = £ a or £ = £) is 



Var 



N ^ 

i=i 



Var{£} 



N 



So it will suffice to study Var{£} and the time needed per sample to calculate 
speedup. Here are a few expressions for variance of a random variable £ : — >■ R: 

„ 00 „ 

Var{£}= / (£-E{£}) 2 dP = ^ / (e-E{£}) 2 dP = E{e 2 }-EU} 2 . 



The behavior of g puts some fundamental limits on variance for the analog 
chain. Let D C SI be the set of paths that reach the detector, and suppose 
the "real life" detector measures flux through the surface, g(z) = v x ■ v (on its 
support). Then g = 1 on the support of g so that £ a ( w ) = and the Monte 

Carlo detector acts as a photon counter. Then, for P a [D] <C 1, 



VMU y/P*[D]{l-P*[D]) 1 



(u, g) p a (z?) y^W 

So for a small detector, the relative variance of analog MC is quite large. Since 
both methods are unbiased, variance is reduced if and only if 



0<E a {ea 2 }-E{f 2 }.= r 



1 - 


d pa 










dP 





dp a . 



The goal of importance sampling is thus to make dP » dP a on as much of 



dP 



su PP(£a) as possible. However, dP must integrate to one and we must have 
defined on D (which we don't have a priori access to). 

We now describe some simplified importance sampling situations. They serve 
to bring intuition to our model. Suppose first that we devise an algorithm whose 
corresponding chain has measure 

f GdP a ((j), ojeD 

with 1 < G < P a [D]- 1 . Now f = t D /G and 

Var{U_. 1-P a [^] 



Theoretically, we can set G = P a [D] _1 and achieve infinite variance reduction, 
i.e., find a zero- variance method which gives the right result with probability 1. 
Assuming knowledge of P a [D] of course means we know the desired integral we 
are attempting to measure and thus is not practical. Moreover, practically, we 
cannot know how to increase dP uniformly (and exclusively) for the a priori 
unknown uj G D, and thus some error is made. But this simple argument shows 
the possibility of achieving zero- variance MC. This will be utilized later in this 
section after we introduce importance sampling based on the adjoint transport 
calculations. 

More practically, we may still devise schemes that increase the draws from 
some known, controlled, set B C Q, "stealing" them from Q\B. In the simplified 
case where we change the measure on 5 by a multiplicative constant b and on 
Q\B by an appropriate constant so that mass is preserved, we obtain that 



Ai>f ^ / &dP »> weB £, , \ KM' UJ & B 

dP(w) = < l-bP*[B] ,pa/ x „ eR c £M = < 1-P*[B] r( x RC 

[ i_pa [B ] (u), w G B , [ 1-bP^B} ^), co e B . 



(15) 



Here, we have defined B c = Q \ B. Then, assuming that £ = Id (i-e., that the 
detector counts photons), 

E { ? } P = nB]+ l-b^[B] F&[D B% (16) 



Let us now optimize the choice of b to maximize variance reduction. Variance is 
significantly reduced when B is a good approximation of D, i.e., when P a [DnB] is 
relatively close to P a [D]. How good an approximation we need may be quantified 
as follows. We recall that P a [DnB] = P a [D]P a [B\D]. We remind the reader that 
P[B | D] is the conditional probability of the event B given D. In other words, it 
is the probability that u 6 B given that the path ui reaches the detector. 
Let us introduce the factors 

P or L^J> 7 P a [jD ]' a P*[B\D\ P a [D] ■ 1 ' 

Starting from (16), some algebra shows that 

Var {I 2 } = P a2 [D] (p a [B\D] (A + - l) . 

Minimizing the above expression allows us to maximize the variance reduction. 
We find that for the optimal value of /3 op t equal to ( v / 7( v / 7 + v^)) -1 ) the minimal 
variance is given by 

Var \i 2 } = P a2 [D](p a [B\D}(^ + V^f ~ l). 

I J min V / 

This shows that the maximal variance reduction is given by 



Var{£ 2 } 



Var 



1 -P a [£>] 



P a [D] P a [B\D](^f + ^E) 2 - I 



(18) 



When B = D, we find that 7=1 and a = 0. In that case, we find again that the 
above value is +00 and that the chain £ has zero variance. 

In practice however, it is unlikely that a will be small. Since P a [D] is small, we 

pa I jyi 

find that a is approximated by p a [B\D\P a [D] . Since P a [D] <C 1 for small detectors, 
a is likely to be large even for reasonable approximations of D by B. It turns out 
that even in that case, we can still expect good variance reductions. When o> 1 
and 7 close to 1, we observe that 



Var{£ 2 } 



Var 



{e} 



^ 1 ^ / P a [B\D]P a [D] y 2 
Popt ~ ,fan~ Wl-P a \B\D})) ' 



l-P a [B\DY ^ p ^7 \7(l-P a [5|D]) 



We observe that for a choice of b close to P a [-D] _1 /3 opt , we obtain very reasonable 
variance reduction when B is chosen so that 1 — P a [S|D] <C 1 but not necessarily 
a < 1 which is equivalent to 1 — P a [i?|iD] < P a [D] and imposes constraints on B 
that are not practical. We use the notation a < b to denote "a < Cb for some 
C < 00." In figure 2, we show the variance reduction (18) for several values of 
P a [f?|D] (left) and its approximation by (19) (right), which works quite well when 
a is large and not so well when a is small as expected from theory. In all plots, 
P a [D] = 0.002, which is close to our actual simulations in section 4. 

Note that (15) may be improved as follows when we know the existence of a 
set C such that C(~)D = 0. Paths in C do not reach the detector and thus we want 
to give them a vanishing weight. The measure in (15) then needs to be modified 
as 



dP(w) = < 



b&P a {u), uj€B 

0, u € C (20) 




Figure 2: Variance reduction by importance sampling. Left: The ratio 
VRR(b, P a [B | D}) := Var{£} /Var||| is plotted vs. bP a [D] for a number of differ- 
ent P a [B\D}. Right: max fe>0 VRR(b, P a b3 | D]) is plotted versus P a [B\D]. In both 
cases variance is calculated in the regime (16). 



This leads to 

E {^ 2 }p = \^ [D nB] + 1 ~T-1l*m [C] p * [D n B% (21) 

The situation (21) is preferable to (16) when P a [C] > 0. The optimal value 
for b is obtained as before with 1 — 7P a [D] in the definition of a replaced by 
1 - P a [C7] -7P a [£>]. 



3.1.3 Modular importance sampling 

Finding the "right" set B is a difficult task: photons making it to the detector may 
undergo complicated interactions with the volume scatterers and the reflecting 
boundary. Moreover, in most settings of importance sampling, the derivative 

(oj) does not take only two values as in the simplified setting (15). B should 



dP a 



dP 3 



dP 



dP 

be replaced by one or several subsets B = B\ U B^ U . . . where the weight 
should be allowed to vary. 

The modularity that we mention in the introduction consists of choosing sets B 
by appropriate approximations to the adjoint transport solution that are relatively 
simple to calculate and have a large intersection with D, the set of paths reaching 
the detector. For instance, a subset B\ could correspond to particles reaching the 
detector after interacting with the boundary, B2 to particles reaching the detector 
after one scattering event in a cloud, and so forth. 

The importance sampling schemes considered in this paper are all based on 



changes of measure of the form 
We summarize them here. 



dP a 



dP 



(w) that generalize that seen in (16) or (21). 



The survival biasing method defined in section 3.1.4 below eliminates the 
volume and surface absorption of photons. Hence it "steals" shots from a subset 
of photons that were absorbed before reaching the detector, and moves them into 
some subset of D. We are therefore in the regime (21) (at least approximately as 



dP a 

dP 



(oj) is not constant on B); see also theorem 3.2 below. 



The heuristic volume scattering method denned in section 3.1.5 below 
scatters (with probability < 1) photons in the volume directly toward the detector 
(rather than using the phase function 9). It has measure dPheu,q v uniformly larger 
than dP s b on the set of paths that scatter once in the volume then hit the detector. 
It modifies the measure (often increasing it) on the set of paths that have their 
last interaction in the volume, then reach the detector. It steals shots from the 
set that interact last with the boundary, then hit the detector. A very rough 
approximation would put us in the regime (16) with B\ ieu = {uj G D : x T ~\ G X}. 

The ideal zero- variance chain derived in section 3.2.1 below sends all pho- 
tons to the detector. It uses an exact calculations of the adjoint solution to sample 
only from D, and is in the regime (14) with G = P a [D] _1 ; see theorem 3.3 be- 
low. When only approximate expressions for the adjoint solution are available, 
the zero-variance chain may be modified to yield small variance chains. However, 
in practice, the calculation of both the adjoint solution (step (i) in the abstract) 



and of the change of measure 
expensive. 



dP a 



dP 



(to) (step (ii) in the abstract) is prohibitively 



The SAI method defined in detail in section 3.4 below is our main example 
of a modular approach to importance sampling. In that method, we devise a 
subset B = B\ U B2 , where B\ corresponds to particles that do not undergo any 
volume scattering and where B2 corresponds to photons that are sent straight to 
the detector after undergoing volume scattering. We will see that the method 
involves the calculation of an adjoint solution in the absence of volume scattering 
and that the calculation of 



dP a 



(w) on Bi, B2, and £l\(Bi U B2) is relatively 
straightforward. Moreover, we will see B\ U B2 is a good approximation of D 
when volume scattering is not too large although B\ and B2 individually are not 
necessarily good approximations of D. In the simplified calculations in (19) and in 
Fig.2, we observe that for P a [Bi\D] = 0.45 and P a [B2\D] = 0.45, we may ideally 
have P a [i?i U-B2I-D] = 0.9, with a potential variance reduction of order 10 whereas 
the variance reduction from B\ or from B2 alone would at best be a factor 2. 



3.1.4 Survival Biasing 

Here we define a classical chain where no photons are absorbed in X, although 
some are possibly in dX (for use in our application where we have perfectly 
absorbing boundaries). This will be related to the analog chain via importance 
sampling. Define 

k s ^(zi -)• x 2 ) : = [5 r ( yZl )(x2)(J s (x2) + S(x 2 - x+(zi))] E as (xi,x 2 ), 

{ 0(x2,vi-m) 
a s (x 2 ) ' X2fe 
a sb (x2)Q{x 2 ,vi^v 2 ), X2 £ dX, 

with a sb (x) = 1 when a(x) > and a sb (x) = when a(x) = 0. We then have: 

ib 1 \ n 1 ib / \ I 0) x 2 G X 

Pc .(z 1 ) = and Ps*(x2,v 1 )={ 1 _ aSb{x ^ % ^ QX 

The Radon-Nikodym derivative is obtained by formally dividing dP a by dP s 6, 
where dP a is defined in (12), and dP s b is defined analogously. Since, for (x T , v T _i) G 
supp(g), a(x T ) = a sb (x T ) = 0, the Radon-Nikodym derivative, restricted to 



{uj : x T £ 7Ta;Supp(g)} is 



dP 



i a 



dP sb 



= E a _ as (X ,X!, X T )7 ajSb (xi) • • • 7 a , s fe(x T _i) 



1, 

a(x)/a sb (x), 



x G X, 
x £ dX. 



(22) 



Defining 



U ■ = € 



dP : 



dP s6 



we have 



E 8b {U} = E a {Z a } = (u, g). 



Theorem 3.2 (Variance reduction by eliminating absorption). We have 



with equality if and only if absorption is zero (with probability = 1) on analog 
paths that reach the detector. 

Proof. Since both methods are unbiased, it will suffice to consider the expected 
value of the random variable squared. Since E a - as (x,y) < 1, (with equality if 
and only if a = a a along the path from x to y), and for j < r, / y a ,sb(xj) < 1 (with 
equality if and only if a(xj) = 1), 



Note that since photons are not absorbed, their path length could be much 
longer than in standard analog sampling. This could result in a decrease in our 
figure of merit (see section 4). In nuclear reactor applications, the multiplication 
of particles with very small weights becomes a real issue and several techniques 
such as Russian roulette have been developed to address this [17, 9]. In remote 
sensing applications with a reasonably large mean free path, and the chance of 
escape into the atmosphere, this is much less of an issue and thus is not considered 
in this paper. 

3.1.5 Heuristic volume scattering adjustment 

In this section, we present a very simple (and classical) direction change kernel 
to be used as part of any modular scheme to handle volume scattering (we use it 
as part of SAI). We modify the volume scattering kernel in order to direct pho- 
tons toward the detector. When a large fraction of photons reach the detector 
with only zero or one volume scattering event (i.e. when a s is small), this is a 
reasonable method. Although better methods do exist, we include this to demon- 
strate our modular variance reduction paradigm. We introduce a regularization 
parameter q v £ (0, 1]. We draw from our modified method a fraction of the time 
approximately proportional to 1 — q v . 



Var{U} < Var{^ a ] 




with equality occurring only under the specified conditions. 



□ 



Let 

x do be the midpoint of the detector (assume one detector). For q v £ [0, 1], 
x 2 € X, put 



qheu{x 2 ,v 1 ) : = (q v - 1) 



9(x 2 ,v 1 ^-x do - x 2 ) 

\\e(x 2 ,v 1 ^-)\\ L oo 



+ i. 



(23) 



For x 2 € X, let fv{x 2 — > v 2 ) be uniform on {v £ S d 1 : r(x2, f ) Pi 7r x supp(<7) 7^ 0}. 
We define the heuristic scattering adjustment direction change kernel by 



So we are aimed toward the detector via /y with probability 1 — qheu- The ratio 
of 9 to its L°° norm in (23) is proportional to the analog probability of heading 
toward the detector; certainly we don't want to send particles toward the detector 
when the analog chain would never do that (the Radon-Nikodym derivative would 
be zero in this case). 

For convenience, we calculate here the change of measure associated to the 
chain that uses survival biasing on the boundary and volume, as well as heuristic 
direction changes in the volume. This is the heuristic chain 



To produce one draw u) from the heuristic chain, we follow algorithm 2. This 
could then be used to estimate {u, g). In this paper, we combine the heuristic 
chain with an adjoint-based method. See section 3.4. 

3.2 Adjoint-based importance sampling 

In this section, we first show how knowledge of the exact adjoint transport so- 
lution allows us to devise a zero-variance method. This generalizes to the case 
of transport with boundaries well-known results for volume scattering [16, 17]. 
When the adjoint solution is approximated, e.g., by a deterministic calculation, 
we show how a non-analog MC chain may be generated. We show that when the 
approximation of the adjoint solution is of order h for a "mesh" size h <C 1, then 
the MC variance is of order h 2 in ideal circumstances (and larger in more complex 
geometries). We will present in section 3.4 a hybrid method that only calculates 
important parts of the adjoint solution at a minimal computational cost while still 
offering sizable variance reductions. 

3.2.1 The zero-variance chain 

Here we describe a chain that uses an exact adjoint solutions (ip*, ipo) an d has zero 
variance. We show that draws from the chain can be made in a manner similar to 
analog, with modified scattering cross-sections. Obtaining ip*, ip* is more difficult 
than solving our original problem (they must be obtained everywhere), hence as 
we mentioned earlier this method is impractical. 




dP sb 




lheu,sb{xi,Z 2 ) 



x 2 e x 



x 2 £ dX. 
(24) 



Algorithm 2 Heuristic scattering adjustment 
1: Draw z ~ s, set j ' <— 
2: while Vj ^ 3 do 
3: if x 2 G X then 

4: Compute qheu(%j+i,Vj) using (23). With probability 1 — qheu set switch ^— true 

5: if switch then 

6: Draw t> 2 ~ fv( x 2 — > •) 

7: else 

8: Draw t> 2 ~ /cf 6 , ( (#2 , f l ) — •) 

9: end if 
10: else 

11: With probability pf*(x 2 ) = 1 — a s6 (x 2 ), u,-+i f 

12: end if 

13: if fj+i 7^ then 

14: Draw from a distribution oc ^((a^+i, i>j) — > •) 
15: end if 
16: j<-j + l 
17: end while 

18: Record £heu(u) = 



Our Markov chain formulation phrases the use of the adjoint in terms of transi- 
tion kernels. This was done explicitly in [16] and implicitly in [17]. Unlike [16] we 
explicitly write out the modified ray-casting and direction-change kernels. Unlike 
either scheme, we explicitly use both the incoming tp* and outgoing tp* adjoint 
solutions. In [16] ip* was used (implicitly) and in [17] both were used (implicitly). 
Define 

k* c ,(zi ->x 2 ) := [5 r(2l )(x 2 ) + 5(x 2 -x+(2i))] E (T (x 1 ,x 2 )^ t ' 2 ' ' ' 



k* s *((x 2 ,V!) ^v 2 ):=< 



V'o(^l) 

9{x 2 ,v 1 ^v 2 )— -, x 2 e X, 

^i{x 2 ,vi) 

( w , ^0(^2,^2) ^ av 

a{x 2 )Q(x 2 ,vi->v 2 )— -, x 2 € dX. 



So we modify the casting by the ratio of the importance of the point we will enter 
to the importance of the point we are exiting. We modify direction changes by 
the ratio of the importance entering x 2 to that exiting. 

Using the equations defining the adjoint solutions, we have 

ph.(zi) : = l-J_kZ.(zi^ x 2 ) dx 2 = l- ^f^y = 0, 

p s *(x 2 ,Vi) : = 1 - / fc 5 *((X2,Vl) ->• V 2 )dV2 = 1 —. r- 

J&- 1 ^*{x 2 ,vi) 
§(x 2 ,vi) _( 1, (x 2 , v\) £ supp(g) 



(25) 



t(j*(x 2 ,vi) \ 0, otherwise. 

The last equality used the fact that ip* = g on the support of g (since a 
there). So all photons reaching the detector are collected. 



We also define a new (normalized) source 



In other words, we bias the photons leaving the source so that they leave in 
directions with high importance. 

Note that sampling is done by alternately casting along a line, then changing 
direction, just as in an analog scheme. Since (off the detector) both k^* and kg* 
integrate to one, they are probability densities. To sample from k* c * {z\ — > x 2 ) it 
therefore suffices to cast along the ray r(z\) and integrate k* c , as we go. Once 
the integral is greater than some uniform random number u ~ U[0, 1], we scatter. 
The relation (10) along with algorithm 1 show that this same procedure is done 
in standard analog Monte Carlo. Sampling from kg* may also be done just as in 
analog Monte Carlo. 

We define dP* in the same manner as dP a . This yields, 



dP*(cj) = s*(z )k* c ,(z Q - 
x dzo • • ■ dx T . 



xi)k s *((x 1 ,v ) ->■ vi)--- k* c ,(z T _i ->■ x T )p* s ,{x T ,v T ^i) 



It is instructive to see that most terms involving ip* cancel in the calculation of 



the Radon-Nikodym derivative 



dP a 



dP* 



. Restricting ourselves to paths that do not 



interact with the boundary and ignoring dzo ■ ■ ■ dx T , dP*(o;) takes the form: 



•5(^0)^0(^0) 



■V(zo) 



(x 1 )E a (x ,x 1 



X K{z 1 ){x2)E a {xi,x 2 ) 



Vi{x2,vi) g(x T ,v. 



6(x 1 ,v -yvi) 



^o(zi) 1p*(x T ,V T -l) 



= {s, ipo) s(zo)5 r t yZo )(xi)E a (x ,xi)6(xi,vo->v 1 )5 r ( Zl )(x 2 )E a (x 1 ,X2) ■ ■ -g(x T ,v T -i). 

This easily combines with (13) to yield (26) below for the Radon-Nikodym deriva- 
tive 



dP a 



dP* 



restricted to paths that do not interact with the boundary. 
More generally, for all paths, which account for both volume and boundary 
interactions, we verify (after careful algebra) that the Radon-Nikodym derivative 
(restricted to the set D) is still given by 

dP a (s, r ) 



dp* 



g(x T ,v T -!)' 



(26) 



Since (for (x T ,v T -i) 6 supp(^)), p^* (x T , v T -\) = 1, the appropriate random vari- 
able to measure is 



dP 3 



dP* 



dP 3 



dP* 



9(x t ,Vt-i) 
p's* {x T , V T —l) 



In the event of highly scattering media, it would be advantageous to use a 
scheme that would reduce the number of scattering events seen by a photon. 
More generally, we would hope that a less expensive route to the detector could 
be taken. Unfortunately, the next theorem shows that the zero-variance scheme 
cannot do this. Fortunately, it also shows that the modified chain does not take a 
more expensive route to the detector. We remind the reader that PL4 | D] is the 
conditional probability of the event A given D. In other words, it is the probability 
that oj G A given that we reach the detector. 



Theorem 3.3 (Identical Paths). Let A C f2 be measurable, and let D C Q denote 
the paths that end with (x T ,v T -i) £ supp(g), then 



P *[A] = £^ (Xr ' K - l)dpa 



In the special case g = 1 on supp(g), then P*[A] = P*[A\D] = P a [A \ D] . In other 
words, the paths taken to the detector in the modified scheme are the exact same 
as in the analog scheme. 

The following corollary follows by letting A = {r = n}. 

Corollary 3.4 (Constant Collision Ratios). 

f T=n g(X T ,V T ^)dP* 



P*\ T = n} = 



In the special case g = tsupp(g), then P*[r = n] = P a [r = n \ D] . In other words, 
the number of collisions photons have before hitting the detector is the same in the 
analog or zero-variance scheme. 

These results show that the zero- variance chain (when the MC detector counts 
photons) is precisely in the regime (14) with G = P a [.D] _1 . In other words, we 
increase the measure uniformly (an optimum amount) on the set of paths that 
reach the detector. 

Proof of theorem 3. 3. 

p ' M ^/J^l dp *^// (x "' F "- l)dp '' 

This proves the first part. When g = 1 on supp(g), the above becomes 
This leads to 

oo oo 

P*[D] = P a [(r = n) n D] = (s, ^> £ P*[r = n] = (s, ft). 

n=0 n=0 

The result then follows from the definition of conditional probability. □ 



3.2.2 Approximations of the zero variance chain 

After seeing the zero-variance chain, one immediately gets the idea of using ap- 
proximations to Y>*, if)* in a variance reduction method. Assuming one can gener- 
ate these approximations (e.g., by using a deterministic solver), it still remains to 
construct a bona fide chain (a probability density integrating to 1) and to sample 
from the corresponding chain (this is step (ii) in the abstract). We show here that 
an arbitrarily coarse adjoint approximation can be used in an approximation of 
the zero- variance scheme. 

The approximation dP h of dP* can be used in the so-called "asymptotic 
regime" where ~ V'o- I n this setting, the calculation of the adjoint solu- 
tions and the sampling from the measure dP h may be prohibitively expensive as 



the number of degrees of freedom necessary to adequately represent the adjoint 
solution is typically very large. 

This approximation can also be used to guide photons along paths to the 
detector. There, it only needs to perform sufficiently well and no longer needs 
to be very accurate. In this case we draw from dP h (a "not-necessarily-good" 
approximation of dP*) only an optimized fraction of the time, while e.g. using 
the analog measure to sample from the rest of the time. In many cases good 
speedup is obtained. The latter methodology is implemented by the SAI chain in 
section 3.4. 

Assume one has ipf ~ ipt, and tpo ~ Vo- Then, following the recipe of the 
zero- variance chain, we could set 

kc*{z\ ->■ x 2 ) : = [6 r ( Zl )(x 2 ) + S(x 2 - x+(zi))] E a (x 1 ,x 2 ) : ' 



fcs,((x 2 ,ui) -»• v 2 ) : 



af , {x 2 ,v 2 ) 

9{X 2 , V^V^—r- -, X 2 G X, 

I \c\< v ^0(^2,^2) ^ r, v 

a(x 2 )&(x 2 ,v 1 -^v 2 )-j^ '-, x 2 G dX. 

Vi(x 2 ,vi) 



(27) 



However, many difficulties arise. For example, it is not clear that ip^ 1 , ^ are 
nonzero whenever ip* , ip* are. In this case, the modified chain will not send 
photons along all paths that the analog chain does, and the result will be biased. 
Assuming we take care of this problem, a more insidious issue arises: What are the 
values of the integrals f x k^, dx, f kg* dv? If both integrate to one (away from 
the detector), then, as in the zero variance chain, we use them as pdfs and sample 
directly from them (say with an accept-reject method, or by pre-calculating a cdf). 
If they integrate to less than one, this gives us a probability of absorption, and we 
need to know this. If they integrate to more than one (very possible), then one 
can still sample from a pdf proportional to them. However, this proportionality 
constant must be known when the Radon-Nikodym derivative is calculated. 
To formalize this, we propose modified kernels of the form 



k a ch (z x -> x 2 ) : = [<5 r ( Zl )(x 2 ) + 5(x 2 - x + (z 1 ))] E%(x 1 ,x 2 ), 
k h ch (zi ^x 2 ): = k a ch {zi -»• x 2 ) 



u»// T , lU ,^._| O h (x 2 , Vl ^v 2 ), x 2 e X, (28) 

k h sh {{x 2 , Vl ) ^v 2 ): = k a sh ((x 2 , Vl ) -> v 2 ) ^ 2 ' V2) 

The new coefficients (E^,O h ,a h ,Q h ) are chosen such that the kernels integrate 
to one or less. In most cases, away from the detector, one would choose the 
integrals to be one (so particles are not absorbed). We also assume that we 
have approximations of the detector and source, g h « g, s h w s. Note that the 
calculation of such coefficients may prove to be quite expensive numerically (this 
is step (ii) introduced in the abstract). In some sense, the coefficients in (28) may 
be seen as normalized versions of the coefficients introduced in (27). However, 
finding rules to calculate this normalizing constants efficiently is non-trivial. We 
will address this issue in the following section in the simplified setting where 
volume scattering is absent. 



Note that such normalizing constants would easily be calculated if k^ h and 
kg h were the kernels of operators C h and S h , respectively, and tpo an d were 
obtained by solving the equations = C h S h ip% + C h g h and $ = S h C h ^ + g h . 
Indeed as in (25), we would then obtain that 



(zi ->■ x 2 ) dx 2 = 1 - 



1- / k h ch {, 
Jx 

/ kc h ((x 2 ,vi) -)• u 2 ) d-u 2 = 1 



CVf(£l) =( 

Y^O^,^) 



(29) 



However, such operators 5 ft and C h would preserve the singularities of the trans- 
port equation (primarily propagation along straight lines) and are therefore cannot 
be discrete. Their kernels in (28) are infinite dimensional and cannot be reduced 
to (finite dimensional) matrices. Any reduction to a matrix form involves approx- 
imations that will modify the structure of the singularities in (28) and render the 
integrals in (29) more complicated to estimate. 

In any case, assuming that our construction (28) defines a bona fide change 
of measures (i.e. P a is absolutely continuous with respect to P h ) so that the 
Radon-Nidodym derivative restricted to {uj : z T £ supp(g)} is well defined, then 
the latter is given by: 



dpa 



(S h , V£> 8{ZQ) 



d ph 

la,h{zi,z 2 ) : 



Pa,h(xO, ■ ■ ■ ,X T )la,h{zi, Z T -l), 



Pa,h{xi,X 2 ) : = 



g h (x T ,v T -i) s h (z ) 

x 2 € X 

x 2 e dx, 

7a,h(zi, . . . , Z n ) := 7a,fc(zi, • • • , Zn-l)7o,h(^n-l, Z n ). 

E a (x!,x 2 ) 



6 h (x2,vr*2) ' 
a(x2)&(x2 ,vtt*2) 

. a h (x2)O h (x2,v-t^H2) ' 



(30) 



E%{ Xl , X2 y 

Pa,h(xo, ■ ■ ■ , X n ) ■ = Pa,h(xo, ■ ■ ■ 

Since (6,Q,a, E a ) 7^ (9 h , @ h , a h , E%) a priori, the telescopic cancellations in (26) 
no longer occur in (30). We then set 



dP 3 



dP A 



so that E h {^} = (u, g). 



When « tp* we expect Var {^ h } *C 1. The rate of convergence is studied 
here in the ideal setting where the following assumptions are satisfied: 

Assumptions 3.1. Assume there exist p, C > such that, for all small enough 

h, 

(i) \{s\ tf)/(u, g) - 1| < Ch, 



4" 1 

gh 



4- 1 



+ \la,h{z\,Z 2 ) ~ 1| + \Pa,h(xi,X 2 ) - 1\ < Ch 



(iii) P h [r = n]< Ce~f n 

(iv) supp(^*) = supp(^o), and supp(^) = supp(Vf) 



Assumptions (i) and (ii) follow if all approximations are 0(h) in the uniform 
norm, all coefficients are bounded below (on their support), and the support of 
the true and approximate coefficients are the same. The third assumption (m) is 
standard in the transport regime with not-too-small mean free path and simply 
indicates that long-distance, multiple-scattering paths are improbable. Assump- 



tions (ii), (iv) ensure 



dP a 



exists everywhere. 



dP h 

Verifying assumptions (i) and (ii) is extremely constraining. However, in this 
idealized setting, we have the following theorem, whose proof is postponed to 
section A. 2. 

Theorem 3.5 (Convergence in the asymptotic regime). Assume that we meet 
Assumptions 3.1. Then as h — > 0, 

Var[i h ] < (u, g) 2 C'h 2 , 

for some C > depending on C and p. 

This result shows that importance samplings with small variance can be achieved 
provided that accurate approximations to adjoint transport solutions are available. 

The aim of all remaining sections and the introduction of the SAI method is 
precisely an attempt at using an adjoint approximation that (i) is inexpensive to 
calculate; and (ii) generates a measure that is both easy to sample from and has 
small variance. 



3.3 Reflecting boundaries without volume scattering 

Here we discretize the operator appearing in section 2.3. This operator arises in 
the limit of zero volume interactions (a — > 0). We first present a discretization of 
the adjoint solution in section 3.3. f and then show how the adjoint solution can 
be used to obtain a non-analog MC algorithm with small variance in section 3.3.2. 



3.3.1 Surface-limit adjoint problem 

In this limit, we have ip* —> ipf. Here, at discretization level h, we approximate 
4>i ~ ipf and Vo ~ ifto- I n this section we assume the boundary is sufficiently 
smooth (of class C 3 ). 

To simplify computation of our numerical solution we make the assumptions 

B(x,v-H)') = 1 Ux . v>0 (x,v)k(x,v'), g(z) = \v x -v\g (x), 

so that g(z) = go(x). We recall that Ja is the "indicator" function of the set A. 
The result is that ipf is then a function of position only. This significantly improves 
the speed of solving the adjoint problem, as well as the memory requirements for 
using it. Theoretical results in this paper do not need this assumption, which we 
make here matter of convenience. 

We will now discretize the coefficients and approximate the operator appearing 
in section 2.3. For z\ G T + , 



S'CipKzx) = a(xi) / K(x 1 ,v 2 )^t{z + (x 1 ,V2))dv 2 . 

J v xi -t> 2 <0 



Notice that S s C s f is function depending only on x, and in fact only on the 
boundary values of /. Since g depends only on x, tpf = J2T=o(^ t 'C s ) k g will 
depend only on x. We thus define 

<p(x) : = ifi\ T+ (x,v). 

We find that ip : dX — > R satisfies the equation 

<p = Q<p + go, Qf(xi) := a(xi) K(x 1 ,v 2 )f{x + (xi,v 2 ))dv 2 . 

J v xi -v 2 <0 

In discretizing this operator, and integrals over directions in general, we use 
the change of variables, 



f(z + (x,v))dv = / f(x',v)d,N(x,x')d f i(x > ), 

v x -v<0 J dX 

d u N(x,x) := -- r . 



\x — x\ 



The term d u N is normal derivative (at x) of the free-space Green's function for 
the Laplacian. One can show (see e.g. the section on double-layer potentials in 
[8]) that for x, x' € dX, v x -(x' — x) < \x' — x\ 2 . Therefore it is in fact an integrable 
function. When d = 2 we have more. 

Lemma 3.1. When d = 2, if dX is C k+2 , then d u N(x,x') is C k {dX x dX). 

The proof is postponed to Appendix A. 3. We now discretize the operator 
Q. First split the boundary into non-overlapping segments {dXj}-^ with dXj 
centered at Xj, with measure \dXj\ < h. Denote by Rf the (orthogonal) projection 
of / onto the space of piecewise constant functions (constant on each segment 
dXj). We also think of Rf as a vector in M Np and Rfj its components. Then, 
after the change of variables (31) we have (at gridpoint x«) 

Qf(xi) = a(xi) / K,(xi,x - x,j)d u N{xi, x)f(x) dfi(x) 

JdX 



IdX 

a(xi) ^2 \dXj\K(xi,Xj - Xi)d v N(xi,Xj)f(xj) 

0<j<N p -l 



(32) 



:=Y,Q H ijRfi- 



This implicitly defines the matrix Q h . So long as dX is C 2 , the above integrand 
L 1 (bounded in two dimensions), hence we are justified in approximating it as 
such. 

We now define our discrete approximation to if as the piecewise constant func- 
tion (vector) (p h solving 

= Q V + Rg- (33) 
We then define approximations tp^ « if)*, ^ « i[>* by 

$(x,v): = <p h (x), $(z-(x,v)):=$(x,v), (x, v) G T + . (34) 
The next proposition is used to apply convergence theorems to the SAI chain. 



Proposition 3.1. Suppose d = 2, \\a — a \\l°° ^ h, \\k — k \\l°° ^ h. Then as 
operators : ^(dX) — > L°°(dX), we have 

\\Q-Q h \\<h. 

Furthermore, assuming \\Q\\ < 1, \\Q h \\ < 1, then we have 

\\rt-ri\\L~(T +) <h. 



Proof. The first inequality follows by a bound on the coefficients of Q—Q , keeping 
in mind that the apparent singularity is actually a bounded function in dimension 
2. The second follows from <p = ^ =0 Q n Rg, f h = T,n=o(Q h ) nR 9i and repeated 
application of relations similar to ah — ab = (a — a)h + a(b — b). □ 

In our implementation, we have chosen to represent angular integrals as in- 
tegrals over the boundary. This works for two reasons. First, as our adjoint 
solution depends only on position it is convenient to evaluate these sums. Second, 
if instead a discretization were chosen that was uniform in angle, then (with only 
finitely many angles) one would often miss the (small) detector in evaluation of 
the integral. 



3.3.2 Surface-adjoint approximations and non-analog chains 

The SAI chain makes use of the approximate surface adjoint solutions ipf, tp^ from 
(34) . They are used almost exactly as in the zero- variance scheme. 

We define the transition kernels following the zero-variance recipe (section 
3.2.1). Keeping in mind ipg(z-(x,v)) = tp^(x,v), we have 

k^ h (z -> x) : = 5(x — x+(z)), 

so photons are cast from one boundary point to another with no absorption, 
exactly as they are in the continuous case. No discretization error occurs with 
casting. To define the scattering kernel k* sh we first recall the zero-variance kernel, 
which, since tpg(z-(x,v)) = ipf(x,v) = (f(x), takes the form: 

k s *((x,v in ) ->v) = a(x)K{x,v) —- . 

<p{x) 

We now discretize directions on every segment dXj. We recall that Xj is the 
center of dXj. Let be the set of directions best approximated by 

Vij := {v G S d_1 : argmin \v — x^ — Xi\ = j}. (35) 

k 

With \Vij\ the measure of this set, we see that, roughly speaking, N(xi,Xj) ~ 
|Vij|/|9Xj|. With this in mind, we present our method for selecting direction. 
Suppose we are at x\ £ dXi, with incoming direction Vi n . First we select a target 
region dXj using using the discrete pdf 

j ' y a(xi)K(xi,Vij)d u N \xi,Xj)\dX ' S Xj \ ■ (36) 



Notice that we use the grid center-point x\ instead of x\. So, for fixed x\ G dXi 
we may calculate the pdf by taking the component- wise product of the i th row of 



Q with (p h , and then divide by {^> h )i- Because of this and (33), the above discrete 
function does indeed sum (over j) to one, so it is a pdf. In a second step, v is 
selected from a uniform distribution on Vij. This defines a direction leaving Xi, 
pointed into the domain, and toward dXj. Note though that a shot leaving x\ in 
direction v may not point into the domain since the normal vectors to dX at x\ 
and Xi are not exactly equal. To correct for this, we introduce a family of rotation 
operators {TZ x ,y '■ (x,y) G dX x dX}, such that 7t Xi ,x'X v ) = v ' ■> where v' is a 
rotation of v that points from x\ into the domain. In two dimensions, the obvious 
choice (which we use) of v' will ensure v' ■ v x i = v ■ v Xi . In three dimensions another 
reference vector (besides the normal) must be pre-selected at each point. Since 
both Xi and x\ belong to dXi, the operator 1Z x . x i. is close to the identity operator 
(this generates a "small" rotation). 

We may now define our scattering transition kernel at arbitrary points by 
referring back to the kernel at grid-points. For x\ G dXi, v'j G Vij, our scattering 
transition kernel is 

k% h ((x'i,v in ) -> v)) = k h sh ((xi,K-l x ,(v in )) -> n-^Xv'j)), 
where for v G Vij 

h \dXj\ (p h (x+(xi, Vij)) 

k S h{{xi,v in ) v) : = a(xi)K(xi,Vij)d l/ N(xi,Xj)—— i J . 

\Vij\ <P a {Xi) 

To put ourselves in the framework (30) we define the discretized coefficients: 

a h : = Roe, g h := Rh, E h a = E c = 1, (38) 

@ h (x 2 ,v 1 ^v 2 ) : = k h sh ((x 2 , Vl ) -> v 2 ) h ) jj^j^ » ■ (39) 

a h [x 2 ) ip h (x + (x 2 ,v 2 )) 

Notice that if we ignore the rotation 1Z, we have (for x[ G dXi, v'j G Vij), 
6 (x^io-tt/-) : = K(x i ,v ij )d 1/ N(xi,Xj 



\dXj\ 

I V ij I 



However, in general, the ratio of <p h will not cancel due to rotation. The Radon- 
Nikodym derivative 



dP a 



-jpft is then given by (30). 
The next lemma shows that this transition kernel is a pdf. 

Lemma 3.2. 

k* sh ((x'i,v)^v')dv' = l-Rg{ Xi ). 



L 



vi -v'<0 



Proof. For x\ G dXi, 
[ k* h ((x'i,v in ) ->v')di/ = [ k$ h ((x i ,K- 1 j(v in ))^Tl- 1 xl (-i/))dv' 

Jv x ,-v'<0 Ju x ,-v'<0 

i i 

= k* sh {(xi,K~} ,(v in )) ->• v)dv. 

Ju x .-v<0 

This follows since rotations preserve measure and {v' : v x i ■ IZT^ x ,(v') < 0} = 
{v : v Xi ■ v < 0} by our choice of 1Z. To integrate the last term, we notice that 



k%h((xi,w) — > v) is constant for v £ Vij. Therefore, using (33) 




N p -1 

kg h ((xi,w) v)dv = ^2 a(xi)K(x i ,Vij)d v N(xi,x j )\dX j 



cp h (xi) 



<P h i 




independent of the incoming direction w. 

In the best of cases, the SAI chain meets the hypothesis of theorem 3.5. 

Theorem 3.6. Assume that for h small enough, there exist C, p > such that 

(i) We have the bounds \a/a h - 1| < Ch, \0/O h -\\<Ch 

(ii) The boundary dX is C 3 and strictly convex 
(Hi) P^r = n] < Ce~ pn for some p > 

(iv) supp(tp) C supp{(p h ) 

Then the hypothesis of Theorem 3.5 are met and 



The proof of the theorem can be found in Appendix A. 3. 

Remark 3.1. Assumptions (i) and (ii) are here to simplify the derivation of the 
result, which may hold in more general settings. In general, assumptions (i), 
(iv) (which together imply supp((/9 /l ) = supp((/?)) require our discrete mesh to be 
chosen to coincide well with the support of the coefficients. Moreover, assumption 
(i) requires that the rotation caused by 1Z causes little change in the value of (p h . 
Smoothness assumptions on ip would provide this. Assumption (iii) means that 
multiple scattering is not dominant and holds in our model cases since we have 
a = on the left /right sides and the sky. 

Remark 3.2. Often physical coefficients such as the detector support are discon- 
tinuous and do not match up exactly with the grid. If g h = 1 on its support, 
p h has a very large jump at the boundary of this support. This means that the 
mismatch in tp h due to rotations will sometimes be large. Also, a non-convex 
boundary will cause issues at points where the curvature changes sign. These 
difficulties all occur at a finite number of points, and lead to an error contribution 
of 0(h) in (40). 

Remark 3.3. Note that when Q(x,v— >v') = for v or v' a grazing angle (i.e., 
\v ■ v x \ or \v' ■ v x \ close to 1), we verify that the rotations 1Z can be set to identity 
for small h and one can verify that k^ h ((xi,Vi n ) — > v) in (37) also generates a pdf 
for x\ close to x«. 

3.4 The Surface Adjoint Importance (SAI) method 

The SAI method is a modular method using the surface- adjoint approximation. 
In the absence of volume interaction, SAI becomes the zero-variance technique 
when h — > that we saw in Theorem 3.6. In the presence of limited amounts of 




(40) 



volume scattering, we will show that SAI properly modified (a "heuristic module" 
is added) can be used for significant variance reduction and speedup. 

For the rest of the paper, we define dP h as the measure obtained by approx- 
imating the zero-variance chain in the absence of volume scattering, i.e. with 
setting a = 0. It is thus defined via (30) with the coefficients given in (38). As 
we saw in section 3.3.1, solving the radiosity equation for the adjoint solution in 
the absence of volume interaction is much less costly than solving a full transport 
equation accounting for volume scattering. 

When a > 0, neglecting volume scattering as we did in our definition of dP h 
causes problems and -jj^- does not always exist. This occurs due to the fact that 
the analog chain sends some photons to the detector after experiencing volume 
interactions, but the chain generated by dP s does not. Thus dP s , or its approxi- 
mation dP h , cannot be used directly for variance reduction as they provide biased 
estimates of (g, u). 

3.4.1 SAI-Heuristic Importance Sampling Scheme 

For these reason, we propose the following regularized scheme: For q = (q v ,q s ), 
with q v £ (0, 1], q s G [0, 1], construct the measure 

dP q : = (l-q s )dP h + q s dP heu ^. (41) 

This means we will fire photons using heuristic (replacing the latter by any unbi- 
ased scheme would work) with probability q s , and use dP h « dP s with probability 
1 — q s . When q s = we do not account for volume scattering and thus cannot 
obtain an unbiased estimator. When q v = 1 we are using the SAI approximation 
combined with survival biasing. 

The algorithm based on (41) is our main example of modular calculation of the 
adjoint solution. Here, volume and boundary scattering are uncoupled. When few 
particles undergo both volume and boundary scattering, then the above measure 
can have a very small variance. Here, we have defined dP h as an approximation to 
the zero- variance measure dP s if only surface scattering were present. The volume 
scattering dPheu,q v is still handled in a very crude fashion. A more accurate 
calculation of the adjoint solution accounting for volume scattering would provide 
larger variance reductions. In the presence of highly scattering clouds for instance, 
dPheu,q v would have to be replaced by a more accurate approximation of volume 
scattering. Yet, the structure of (41) would remain the same. 

We then see that three parameters need to be chosen: h, q s , and q v . The 
regularizing parameters q s , q v should be chosen as a function of the mean free 
path. Ideally we could choose q s = (q v has no effect then) when the MFP is 
infinite. With finite MFP we must use q s > (in fact close to one, even when 
MFP~ 16 times the domain diameter). As MFP decreases, both q s and q v should 
decrease to allow for more analog shots that account for complex volume or volume 
+ boundary interactions. The parameter h should then be chosen to maximize 
the figure of merit: Small values of h generate large computational cost (due 
to the expense of the deterministic solve) for limited variance reductions since a 
significant variance comes from shots that interact with the volume. Simulations 
show that very small values of h yield no measurable improvement in variance. 

Note that the asymptotic regime is no longer a good description of the above 
method. Instead, different subsets of paths to the detector are chosen, and dif- 
ferent methods are used to increase the probability of their occurrence. Figure 
3 shows both boundary and volume photons being directed toward the detector. 



Figure 3: Boundary and volume interactions handled by different modules 



The boundary photon was directed using dP h . The relative size of the adjoint 
solution on the boundary is indicated by relative dot size. Note that the adjoint 
solution allows us to account for complex boundary interactions. 

The details of the SAI algorithm are as follows. First we produce N draws 
{u' l }^ =1 from P g using algorithm 3, then estimate (u, g) ~ iV -1 where 

£ 9 " 



dP a 



dP 



(expression derived below). When we draw from k^ h , k^, h in algo- 
rithm 3, shots are never absorbed until they reach the support of the discretized 
detector g h . 



Algorithm 3 SAI 



1 

2 
3 
4 
5: 
G: 
7 
S 
9 

10 
11 
12 



With probability 1 — q s , set switch ^— true 
if switch then 

Draw z from a density oc s(z)ip^(z) 



while Xj supp( 



Draw Xj + i 



k ch (zj 



do 

— >■ 



(In this case we simply set Xj + i 
Draw v j+ x ~ k* sh ((x j+ i,Vj) •) oc Q h (x j+ i,v j ^-)^(x j+ 
Set j <- j + 1 
end while 
Set Vj <— d 
else 

Draw bj ~ dPheu,q v using algorithm 2 
end if 



X+(Zj)) 



i- 



We now derive an expression for 



had a volume interaction), 



dP a 



dP„ 



where 



dP a 

~3FZ 



dP a 



dP Q 



dP a 



dP 



Whenever dP h = (say the photon 



(restricted to (x T ,v T -i) G supp(g)) is given by 

dP a 



dP s 



1 



is given by (22), and 



(j r 

dP 

KLL heu 



HPT 



dP 

KLJ - heu, i 



dP, 



dP s 



is given by (24). When dP h ^ 0, 



we have 



dP a 




d pa 

dp' 1 




dP a 

dp fe 






(1 - q s ) + q s 


HP 

^ * 1 heu,q v 

dP fc 


(1 - q s ) + q s 


HP 

v 1 1 heu,q v 

dP 3b 


dP sb 
dP fe 



So we need expressions for 



dP a 



dP n 



and 



dP„ 



dP" 



dP s 



dP a 



dP a 



dP n 



Note that 



dP a 



dP" 



is given by (30), but simplifies since E h = 1, and the fact that when dP h / we 
have necessarily taken a path such that all Xi £ dX , which makes the expression 

dP s 



for 7 aj h (a term in 



dP a 



dP h 



) "simple". Also note that 



d pa 



is given by (22) and 



that ja^sb is "simple" for the same reason 7^ was. We therefore have 
dP a (s h ,^) s(z ) 



dp h 



g*(x T ,v T )s\z j EaiX0 >' 
a(xi)Q(x 1 ,v -^vi) ■ 



, x T ) 

a(x T - 1 )@(x T - 1 ,v T - 2 -^v T - 1 ) 



dP 



sb 



a h (x 1 )Q h (x 1 ,v ^v 1 ) ■ ■ ■ a h (x T ^ 1 )Q h (x T ^ 1 ,v T _2^v T ^ 1 ) ' 
s(zq) 



g h (x T ,v T ) s h {z ) 
a sb (xi)@(xi, vq— >v 1 



E as (x , ■■■ ,x T ) 

■ a s6 (x r __i)9(2; r _i, v t - 2 -^Vt-i) 



a h (xi)@ h (x 1 ,v ^vi) • • • a h (x T -i)Q h (x T -i,v T -2-^v T -i) ' 



Even though these expressions are complicated to write explicitly, we empha- 
size that their computational cost is rather minimal compared to the overall cost 
of solving a transport equation by Monte Carlo. 

3.4.2 Optimal parameter selection 

Here we outline two procedures to pick values of q s close to optimal. To simplify, 
we assume that q v is fixed in the heuristic module with dP 
assume that g = 1 on its support so that £ 
handled with additional hypotheses. 
Note that 



heu,q v • 



As before, we 



to although general g could be 



d pa 



dP fi 



(1 



dP" 



dP 3 



+ 



dP 



heu,q v 



dP a 



and that the quantity to minimize with respect to q s is thus 



d pa 



dP 



Id 



dP a 



dP 



dP 8 



We split the above into integrals over B and D\ B, where B is the set of paths 
of particles that reach the detector without undergoing volume scattering (but 
can have many interactions with the boundary). We assume for simplicity that 
supp(P ft ) = B, i.e., that discretization effects do not significantly modify the 
support of P s (otherwise, B should be thought as the support of P h ). 

On B, we find that for any subset B' C B, we have P h (B') > P heu ,q v { B ') = 
P a (B') (at least when neglecting discretization effects). The reason is that the 
paths reaching the detector after interacting with the boundary have very high 
probability density dP h (this is exactly the role of dP h : sending particles in- 
teracting with the boundary toward the detector). However, for such paths, 



Pheu,q v (B') = P a (B') since heuristic sampling only modifies those paths that 
undergo volume scattering. For u G B, we thus find that 



dP a 



dP n 



1 — 1 


dP a 


1 - 5s 


dP h 



(uj)-e(uj), 



with < e(cu) <C 1. On D\B, dP h = since paths with volume scattering have 
vanishing weight under the boundary measure dP h . As a consequence, we have 
that 



1 — 5s JBnD 



dp a 



dP" 



dP a + 



1 / 

5s Jd\b 



dP a 



dP 



heu,q v 



dP a 



(42) 



The above can be optimized over q s once the two integrals are known. Since they 
are both expectations (with respect to dP a ), we can estimate them with an analog 
simulation, or send particles using dP q , and use importance sampling. In this way 
our optimal choice of q s can be refined as more particles are sent. We find 



(5. 



s Jopt,l 



yfo. 



1 + v 7 ^' 



a :- 



' D\B 



dP a 



dP h 



eu,q v 




BnD 



dP a 



dP h 



dP 3 



Alternatively, and because calculating the integrals in the definition of a is 
still difficult, we may obtain another guess using only a priori estimates of P a [D] 
and P a [B | D] . This can be done in the simplified importance sampling framework 
of section 3.1.2, specifically the regime (16). 

First we define b £ R as the constant that makes P g [-B] = 6P a [-B]. Second, we 
recall that P a [B] = P heu ^ v [B]. Thus, using (41), we find 

bP a [B] = P q [B] = (1 - q s )P h [B] + QsP^B}. 
Using the approximation P h [B] m 1 (neglecting discretization effects), we have 

1 — 5s 
b *P^B] +qS ' 

and thus 

^ 1 - bP*[B] = l-(bP*[D])P*[B\D] 
qs ~ l-P a [B] l-P a [£)]P a [5|D] ' [ ' 

Assuming that dP q (oj) = 6dP a (u;) on B, we are approximately in the regime (15). 
We thus choose 6 op t = P a [-D] _1 /3o P t minimizing (16) with /3 opt = (^(y/j+V^))^ 1 , 
where a and 7 are defined in (17). Then, we find (q s )opt,2 in terms of 6 pt using 
(43). See section 4.3 for an implementation of this algorithm. 



4 Numerical Results 

In this section, we implement the scheme described in the preceding section and 
sample chains numerically based on the measure dP q for several values of q and h. 
We compare the variance of the method with the survival biasing measure dP s &. 
Several details of the implementation are described in section A. 4. 

The rotation 1Z described in section 3.3.2 were found to have an extremely 
limited effect on the calculated solutions. Even with coarse grids, neglecting the 



rotations (setting them to the identity matrix) led to less than 0.1% bias (the bias 
was so small that it could have been error due to not firing enough shots). See 
also remark 3.3. So the presented results are obtained with the rotations set to 
identity. 

We first introduce the notion of speedup (a.k.a. figure of merit) in section 4.1. 
We consider two speedups depending on whether the cost of the deterministic 
adjoint solution is included or not. Then in section 4.2, we show the influence of 
the discretization parameter h on the convergence of the variance to (and the 
speedup to infinity) in the absence of volume scattering (a = 0) and compare the 
numerical results with theoretical predictions. Finally, in section 4.3, we include 
volume scattering and obtain significant variance reductions by appropriate choice 
of the regularization parameters (q s ,q v )- Moreover, we show that large speedups 
are obtained for a relatively large band of values of (q s ,qv), whose optimal values 
very much depends on geometry/scattering/absorption and has to be obtained 
fairly empirically. 



4.1 Speedup (figure of merit) 

For all of these methods, define the approximation after N random draws 

1 N 

M6: = t?E£«>- 



N 

n=l 



The RMS estimation error e is given by 

s(0 : = y/E{\I N (£)-(u, g)n = 

For a given error level e, the required number of MC draws is then N(e, £) := 
Var{£} je. The required simulation time T(e, £) for one estimation of E{£} is 
given by 

r(0Var{£} 



T(e,0: = M0 + r(0N = T Q (0 + 



.2 



where ?o(£) is the time needed to compute the deterministic adjoint solution (e.g. 
at level h when £ = £ h ), and t(£) is the expected time for one draw using the 
appropriate measure for the random variable £. We foresee the use of SAI in 
situations where the boundary remains fixed, but the volume changes (due to 
e.g. moving clouds over a fixed surface). We therefore consider the time for m 
simulations using one boundary, 

T(e, £, m) : = T (£) + m T {i)N = T (£) + { ® , 

Then we compare schemes through the "Speedup." 

q a ( c t ^ e 2 r (fr ) + mr (ft )Var{ei} 

Speedup(£i,£ 2 ,£,m) : = — — = 2 —-— — r—r- 

T(e, Z 2 ,m) e 2 T (£ 2 ) + mr(£ 2 )Var {£ 2 } 

For a deterministic approximation of we expect To(£) ~ C (^)h~ 2 ( d ~ l \ We 
in fact measure (with d = 2) To(£ h ) ~ 0.017/i" 2 . Our "benchmark" scheme is 
survival biasing. Since requires no deterministic solution, the relevant ratio is 

o j it cq \ mr(£ s& )Var {£ s& } 

Speedup(£ s6 ,£ 9 ,e,m) = — ^ — — . 

(f) 2 C + mr(^)Var{^} 

We measured speedup when either m = 10 or m = oo ( "Ignoring deterministic 
solve"). 



4.2 Variance reduction without volume interactions 



When the volume mean-free-path is infinite, we can show that the variance ap- 
proaches zero as h — > 0. 

First consider the case of a flat boundary. Photons leave the sky, hit the 
boundary, then either reach the detector or are "absorbed" by the sky or sides. 
Neglecting edge/detector overlap we are in the regime of assumptions 3.1. There- 
fore, combining lemma 3.1 with theorem 3.5 we expect approximately 0(h 2 ) con- 
vergence. In practice we observed 0(h ls ) convergence. See figure 4. 



Flat mountain, MFP = oo cos 3 mountain, MFP = oo 




Figure 4: 0(h a ) variance behavior. Complicated cos 3 mountain results in slower con- 
vergence. 

Second, when the more complex cos 3 boundary of figure 3 is used, we require 
q s > for the following reason: Suppose a photon finds itself at the point x = 
(—0.4, 1 + cos 3 0.4). On a fine boundary, there is a point nearby that has a direct 
line to the detector. Therefore, when @ h (x, v'— >v) will allow for shots directly to 
the detector. One can see this by noticing that in figure 4.2 the point (—0.4, 1 + 
cos 3 0.4) is shaded darkly in the fine boundary (right), indicating that it sees direct 
illumination from the detector. On a coarse boundary (left) this is not the case. 
In practice we observed a major contribution to variance due to these effects, and 
0(h) convergence overall. See figure 4 and also Remark 3.2. Also, (not pictured) 
we observe that with a flat boundary and fine discretization is used, the optimal 
regularization parameter is q s = 0. When a coarse discretization or cos 3 boundary 
is used, the optimal q s ^ 0. 

4.3 Variance reduction with volume interactions 

To analyze the variance of the SAI chain in the presence of volume interactions, 
we adopt the modularity viewpoint explained in section 3.1.2. Note that even 
when the error (u, g) — {ip^, s h ) is high, we still get good variance reduction. See 
figure 6. This emphasizes the point that the quality of the deterministic solve is 
not so important in a modular scheme. 

Our implementation swept both q s and q v . As expected, we see decreasing 
speedup with increasing volume scattering strength a. See figure 7. 

It is important to note that use of adjoint-enhanced surface scattering, and 
heuristic volume scattering (q s < 1, q v < 1) together is especially helpful. In fact, 
even with a small MFP=1.3xDiameter, we realize good speedup when q s = 0.9, 




Figure 5: Boundaries discretized on coarse (left) and fine (right) scales. Dots indicate 
adjoint flux at mesh points. Size is relative to flux strength. 




Figure 6: \(u, g) — (s h , ipg)\/(u, g) is generally lower for smaller h. However, speedup 
is still very good even for large h. 



q v < 1. Note that if either q s = 1 or q v = 1 (so no use of dP or heuristic scatter- 
ing adjustment), speedup almost disappears. Following the setting described in 
section 3.1.2 and the discussion at the end of section 3.1.3, this may be explained 
as follows. Assuming that D is well approximated by B\ U B^ where B\ and Bi 
have approximately the same size and B\ n B2 ~ 0. Then the maximal variance 
obtained by choosing only B\ or only B2 in the importance sampling is roughly 
a factor 2 whereas the maximal variance obtained by choosing both of them is 
very large. When the computational cost of the deterministic solve is taken into 
account, we obtain the results in figure 7, which show that both boundary and 
volume scattering need to be accelerated in order to obtain significant speedups. 

4.4 Optimization of the parameter q s 

We now compare the a priori estimates of an optimal q s (computed using the 
methodology in section 3.4.2) with the observed optimal values (from figure 7). 
To compute the a priori estimates we need estimates for P a [J3], P a [-B | D]. First we 
assume P a [i? | D] f» 1 — P a [V], where the set V are the photons that had a volume 
interaction before dying. Since P a [F] is rather large, it is easy to estimate with a 
very short analog simulation. We found that MFP = 16 x Diameter corresponded 




Figure 7: Speedup when using both surface adjoint approximation ^ (with parameter 
q s ) and heuristic volume scattering (with parameter q v ) 




Figure 8: A priori Optimal q s (a priori estimate and observed) for q s is plotted for a 
variety of P a [£|Z>]. 



to P a [V] = 1/21, and MFP = (8,2.7, 1.3)x Diameter corresponded to P a [V] = 
(1/11, 1/4, 1/2.35) respectively. This gives us estimates of P a [B \ D}. Now we need 
an estimate of P a [L>]. The true values lie in the range [0.002325,0.002484]. This 
can be estimated fairly quickly (to within 10% RMS error) using 25,000 survival 
biased shots. In figure 8 we plot the a priori optimal q s versus MFP/Diameter 
using the above estimate for P a [i? | D] and setting P a [D] = 0.0024 (10% errors in 
P a [Z?] make very little difference). 



A Appendix 

This section collects details left out in the preceding sections. 

A.l Proof of theorem 3.1 

Proof of theorem 3.1. For a proof of the theorem in the absence of a boundary, 
we refer the reader to [16]. The analog probability density is defined in (12). If 
Y : Q — > E is a random variable, then 

oo „ oo 

E «{ y } = E/ ^dP a = ^E a {yi r= „}, (44) 

n=l Jr=n n=l 

where, following the structure in algorithm 1, we have 

E a {Yl T=n } = E a {E a {Yl T=n | Z n _ ± }} 

= E a {E a {E a {Yl T=n | Z n _i, Z n _ 2 } | Z n _ 2 }} 

= E a {• • • E a {E a {Yl T=n | Z„_i, • • • , Z } | Z n -2, ■ ■ ■ , Zq} ■■■ \ Z } ■ 
Using (5), (7), and (44), it will suffice to show 

E a Ualr=n} = (C*g, (KL) n - l S ), U = 1, 2, . . . 

First note that (since g is a boundary source extended to be zero off of T + ) 

Ea {£alr=„ | Z„-l} = / ^ ^ fcg. . (Z w -i -> X„)p|.(x n ) dx„ 
JX Ps*\ x n) 

= 5(x + (Z„_i),K-l)^ ( r(^n-l,a:+(2 , n-l)) = C*<7(Z„_i). 

So when n = 1, we have 

Ea {£ a lr=l} = E a {E a {£^=1 \ Z }} 

= E a {C*g(Z )} = J_ s(z )C*g(z ) dz = (C*g, s) r _ . 
Next note that for m < r, 

E a {f(Z m ) | Z m _i} = j_f(z)k a T ,{Z m ^ -> z m )dz m = C*S*f{Z m -{). 
So when n > 1, we have 

Ea {&lr=n} = E a {E a {£ a lr=n I ^r-l}} = E a {C*g(Z T ^)} 

= E a {E a {C*g(Z T ^) | Z T _ 2 }} = E a {(C*S*)C*g(Z T - 2 )} 

= E a {(C^T^C*^)} = J s{z,)(C*S*) n - l C*g{z,) dz 
= (s, iCS*)"- 1 ^) = ((KL) n -\ C*g). 
This proves the theorem. □ 



A. 2 Proof of theorem 3.5 

We note that 

dP 



i h = g(x T ,v T -i, 



(u, g)(l + e h ), 



n . / n (s h ,4>o) g(x T ,v T -i) s(zq) 

1 + e h (u>) : = { ^ g) - gh{Xr:VT _ l)sh{zo) ^o, , * r )7«*(*i, • • • , *r-i). 

We now bound the coefficient error e^. First, assumptions 3.1 (i), (ii) give us 

(1 - Ch) T+1 < 1 + e h < (1 + Ch) T+1 . 
Using the binomial theorem, we have (for x > 0, m € N) 

1 - mxe ma; < (1 - x) m < (1 + x) m < e ma; . 

Therefore 

(1 - Ch) T+1 < 1 + e h < (1 + C/i) T+1 , 

and thus 

N < hC[r + l}e hC[T+1] . 
Since we assume ~P h [r = n] < Ce~ pn , we have 

oo „ oo „ 

Var U h \ =J2 (£ h ~ <«. 2» 2 dP ft = (n, 5 > 2 £ / | £/l | 2 dP ft 

n^" 7 ™ n=0 Jr=n 

oo 

< («, 5 ) 2 /l 2 C 2 ^[r + l]2 e -(pr-2ftC[r+l])_ 
n=0 

So that, for /t < p/(2C) the above series converges and the result is proved. 

A. 3 Proof of some technical results 

Proof of Lemma 3.1. Clearly d v N is C k+2 when \x' — x\ > 0, so we may restrict 
our attention to \x' — x\ < e. We prove the lemma then for x', x both in an e 
neighborhood of some point. After possibly shrinking e, we may assume that in 
this neighborhood dX is the graph of a C k+2 function /. In other words, with 
x = (xi,X2), and after a rotation and/or translation, this neighborhood is the set 
{(xi,f(xi)) ■ -e <x 1 < £■}. 
We then have 

x - x = (xi - x[,f{xi) - f{x[)), 



Vi + (/'(^i)) 2 ' 

and also 

f{x'i) = f{xi) + f'(x 1 )(x' 1 - xi) + 
12(xi,xi):=/ / f"(t)dtds = (x' 1 -x 1 ) 2 / / /, (t(x , 1 -x 1 ))dtds. 

JO •/ 



We notice that 

R{x 1 ,x' 1 )(x[ - xi)~ j G C k {dX x dX), j = 0,1,2. 
This is all we need since 



v x ■ (x - x) 



vi + (m)) 2 ' 



\x — X 



/|2 



I \2 



d u N(x,x') : = 



(xi - x x ) 
v x ■ (x - x') 



V x\ - xi J 



x — X 



'12 



□ 



Proof of theorem 3.6. Our setup so far puts us in the regime of section 3.3.1 with 

E$ = E a = l, g h = Rg, a h = Ra, 

\dX -I 

Q h (x' i ,v^-v' ij ) = K{xi,v i:j )d u N{xi,Xj) 3 , for x\ G dX^v'^ G Vij. 

Assumptions (iii), (iv) are the same above and in assumptions 3.1. Using 
assumption (i) above along with proposition 3.1, we have assumption 3.1 (i). It 
remains to prove that assumptions 3.1 (ii) is met. Due to assumption (i) above, 
it will suffice to show 



l-C'h< 



\dXj\d v N(xi,Xj] 



< 1 + C'h. 



(45) 



Due to strict convexity of dX, d u N is bounded below. Now the differentiability 
of d v N (lemma 3.1) implies that there exists C > such that when x' G dXj, 



i-c'h< d : N ^ x '\ <i+c'h. 



d u N(xi,Xj) 



(46) 



Therefore, since 



Vij\ = / d u N(xi,x')dfi(x') = d„N(xi,Xj) I 

JdX, Jd 



d u N(xj,x') 
Qx d v N(xi,Xj] 



dfi(x'), 



(46) now implies (45) and the proposition is proved. 



□ 



A. 4 Parameter choices in numerical simulations 

In the simulations performed with a = (no volume interactions), we used both 
a flat surface (so that our domain was [— tt, tt] x [2,4]) and a cos 3 surface (figure 
3). We swept h, with 0.002 < h < 0.2. We did not use any heuristic scattering 
adjustment (q v = 1.0). In all cases 

U{x,y,v^v)-<^ Q ^ otherwise. 
The source was mono-directional v = —ir/2 and given by 

( ln s Jl |x|<2.5, 



In the simulations involving volume interactions (a > 0), we used a cos 3 type 
surface. We computed speedup in a variety of cases. The mean-free-path MFP= 
ex -1 was varied as well as q s , h, and q v . We swept 0.002 < h < 0.15. In all 
cases the volume scattering coefficients were constant with a s = 2o~ a . The volume 
scattering was given by 

0(x,v->v') = l + (vv') 2 . 

The other coefficients were chosen to have features (in this case oscillations) on 
a scale coarser than the fine values of h, and finer than the coarse values. The 
surface scattering coefficient was given (on the mountain) by 

( x > 2.5, 

@((x, y), u-W) = (y x ■ v') I 0.75 + 0.25 sin(2vrx/0.05) 1 < x < 2.5, 

1^0.35 + 0.25sin(2vrx/0.05) -2.5 < x < 1, 

when v x -v' > 0, and when u x -v'< 0. Off the mountain there was no scattering 
(perfectly absorbing). The source was mono-directional v = —n/2 and given by 



s(x,-n/2) 



1 + 0.25sin(27rx/0.07) \x\ < 2.5, 
\x\ > 2.5. 
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